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Abstract — We  investigate  the  solution  of  a  large  class  of 
fixed-final-state  optimal  control  problems  by  a  group  of 
cooperating  dynamical  systems.  We  present  a  pursuit-based 
algorithm  -  inspired  by  the  foraging  behavior  of  ants  -  that 
requires  each  system-member  of  the  group  to  solve  a  finite 
number  of  optimization  problems  as  it  follows  other  members 
of  the  group  from  a  starting  to  a  final  state.  Our  algorithm, 
termed  “sampled  local  pursuit”,  is  iterative  and  leads  the 
group  to  a  locally  optimal  solution,  starting  from  an  initial 
feasible  trajectory.  The  proposed  algorithm  is  broad  in  its 
applicability  and  generalizes  previous  results;  it  requires  only 
short-range  sensing  and  limited  interactions  between  group 
members,  and  avoids  the  need  for  a  “global  map”  of  the 
environment  or  manifold  on  which  the  group  evolves.  We 
include  simulations  that  illustrate  the  performance  of  our 
algorithm. 

I.  INTRODUCTION 

In  nature,  many  animal  groups  exhibit  highly  organized 
and  efficient  “collective  behaviors”,  despite  their  members’ 
limited  intelligence.  For  instance,  worker  honey  bees  can 
coordinate  their  distribution  among  different  flowers  in 
accordance  with  the  profitability  of  each  source;  a  school 
of  fish  can  move  together  in  a  tight  formation;  ants  can 
recruit  nest-mates  to  form  efficient  foraging  trails  [1],  [2], 
[3].  These  examples  illustrate  how  aggregate  behavior  may 
be  qualitatively  different  from  individual  actions  and  that 
cooperation  among  members  of  a  natural  collective  helps 
them  overcome  their  limitations  and  accomplish  complex 
tasks  that  may  be  impossible  for  them  to  attain  individually. 

Observations  of  the  qualitatively  similar  behaviors  of 
members  of  animal  groups,  coupled  with  their  cognitive 
and  physical  limitations,  support  the  conclusion  that  their 
collective  efficiency  and  elegance  are  self-organized  and 
must  be  “encoded”  in  fairly  simple  patterns  (as  far  as 
individual  actions  are  concerned),  in  contrast  to  the  complex 
performance  of  the  group.  Moreover,  many  of  the  tasks 
performed  by  natural  groups  are  functionally  similar  to 
what  one  might  require  from  engineered  collectives.  In 
some  cases,  members  of  a  biological  group  and  those 
of  a  decentralized  group  of  autonomous  systems  operated 
under  similar  constraints  in  the  sense  that  they  are  both 
usually  equipped  with  limited  sensing,  communication  and 
computing  capabilities. 
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The  potential  of  a  group  to  “be  more  than  the  sum 
of  its  members”  has  already  seeded  a  variety  of  recent 
research  directions  in  system  and  control  community,  from 
modeling  of  animal  groups  [1],  [4],  [5],  to  distributed 
collective  covering  and  searching  [7],  [8],  estimating  by 
groups  [9],  [10],  biologically-motivated  optimization  [11], 
[12]  and  cooperative  robotic  teams  [13],  [14].  The  objective 
of  this  paper  is  to  investigate  the  biologically-inspired 
cooperative  solution  to  a  class  of  optimal  control  problems 
with  fixed  final  states.  We  are  particularly  interested  in 
applying  models  of  the  foraging  behavior  of  ants,  which 
are  well-known  path  optimizers  (see  for  example  [1]). 

One  of  the  early  optimization  methods  inspired  by  trail 
formation  in  ants  was  presented  in  [4],  where  it  was 
shown  that  ants  that  “pursued”  one  another  on  R2  (each 
pointing  its  velocity  vector  towards  a  predecessor)  had  the 
effect  of  producing  progressively  “straighter”  trails.  That 
idea  was  later  extended  to  kinematic  vehicles  moving  on 
non-Euclidean  environments  [6].  Both  these  works  were 
restricted  exclusively  to  the  “discovery”  of  geodesics,  mean¬ 
ing  that  the  autonomous  systems-members  of  the  group  had 
very  simple  dynamics  with  no  drift  terms.  In  this  paper  we 
show  that  the  earlier  work  can  be  generalized  to  a  much 
broader  class  of  optimal  control  problems,  and  agents1  with 
non-trivial  dynamics.  We  propose  an  iterative,  decentralized 
algorithm  that  involves  “local  pursuit”  (to  use  the  term 
coined  in  [4]),  of  members  of  a  collective,  this  time  in  a 
broader  and  more  intricate  setting.  Our  algorithm  has  lower 
computational  requirements  than  previous  “continuous  pur¬ 
suit”  formulations  and  requires  agents  to  communicate  with 
their  neighbors  a  finite  number  of  times.  The  agents  do  not 
need  a  global  map  of  their  environment  or  even  an  agreed- 
upon  common  coordinate  system.  The  proposed  algorithm 
is  most  useful  in  trajectory  optimization  problems  which  are 
easier  to  solve  when  boundary  conditions  are  “close”  to  one 
another  (because  of,  for  example,  the  agents’  computational 
or  sensing  limitations),  with  the  term  “close”  taken  to 
include  not  only  geographical  separation  but  also  distance 
on  the  manifold  on  which  copies  of  a  dynamical  system 
evolve. 

The  remainder  of  this  paper  is  organized  as  follows:  In 
Sec.  II  we  describe  the  class  of  optimal  control  problems  we 
are  concerned  with  and  propose  an  iterative,  decentralized 
solution,  termed  “sampled  local  pursuit”,  which  is  inspired 
by  the  foraging  behavior  of  ants.  Section  III  contains  the 

1  Throughout  the  paper  we  will  use  “agent”  to  refer  to  a  member  of  a 
group  of  control  systems. 


main  results  discussing  the  behavior  of  a  collection  of 
control  systems  evolving  under  the  proposed  algorithm.  Sec. 
IV  presents  a  pair  of  simulations  designed  to  illustrate  the 
performance  of  sampled  local  pursuit. 

II.  A  DISCRETE  BIO-INSPIRED  ALGORITHM 
FOR  OPTIMAL  CONTROL 

We  are  interested  in  optimal  control  problems  using  a 
group  of  cooperating  “agents”.  For  our  purposes,  each  agent 
is  a  “copy”  of  a  dynamical  system: 

xk  =  f(xk,uk ),  xk(t)  £  M",itfc(t)  e!lc  Mm  (1) 

for  k  =  0,1,2....  Physically,  each  copy  of  (1)  could 
stand  for  a  robot,  UAV  or  other  autonomous  system.  Each 
xk(t)  :  [0,T]  — >  R”  represents  a  trajectory  defined  by  the 
kth  agent’s  evolution. 

A.  Problem  Statement  and  Notation 

Assume  that  there  is  a  pair  of  states  £0  and  Xf  which 
are  equilibrium  points2  of  (1)  for  u  =  0.  The  problem 
we  are  concerned  with  is  finding  a  trajectory  x*(t)  [t  £ 
[0 ,T\,T  fixed)  that  minimizes 

pto+T 

J(x,x,t0,T)  =  g(x(t),x(t))dt  (2) 

Jto 

with  x(to)  =  x0,  x(t0  +  T )  =  Xf,  g(-,  •)  >  0,  and  subject 
to  (1). 

It  will  be  convenient  to  define  the  following  notation.  Let 
ID)  C  1"  be  a  domain  containing  states  a  and  b.  Assume 
0  <  a  <  T  and  to  >  0.  The  optimal  trajectory  from  a  to 
b  in  fixed  T  units  of  time  will  be  denoted  by  x*(t)  (t  £ 
[t0,t0  +  T])  satisfying: 

J{x*,  x*,to,  T)  =  min  J(x,  x,  to,  T ),  (3) 

X 

subject  to  x(to)  =  a,  x(to  +  T)  =  b.  We  define  the  cost  of 
following  the  optimal  trajectory  from  a  to  b  for  a  units  of 
time  with: 

ptQ-\-<7 

r)(a,b,T,t0,cr)  =  g(x*(t),x*(t))dt  a  <T  (4) 

Jt0 

where  the  optimal  trajectory  x*(t)  is  defined  in  (3). 

For  a  generic  trajectory  x(t)  of  (1),  we  define 

rto~\-a 

C{x,  to,  a)  =  /  g(x(t),x(t))dt  (5) 

Jt0 

to  be  the  cost  incurred  along  x(t)  during  [to,  to  +  cr). 

The  following  can  be  derived  easily  from  the  properties 
of  optimal  trajectories  and  are  helpful  in  future  argument. 

Fact:  Let  r),C  be  defined  by  (4), (5),  let  xk(t)  be  a 
trajectory  of  (1)  and  x*(t)  an  optimal  trajectory  of  (3). 
Then: 

^Without  loss  of  generality  we  assume  that  u  =  0  at  those  equilibria. 


1)  rj(a,b,T,to,  cr)  <  C(xk,to,cr)  for  any  x*(t)  sat¬ 
isfying  (3)  with  xk(t0)  =  x*(t0),  xk(t0  +  a)  = 
x*(t0  +  cr). 

2)  rj(a,c,T,t0,T)  <  rj(a,b,a,t0,a)+r](b,c,T-a,t0  + 
cr,T-cr) 

3)  C(xk,t0,T)  =  C(xk,to,cr)  +  C{xk,t0  +  cr,T  -  a) 

4)  rj(a,b,T,t0,a)  =  C(x*,t0,cr). 

B.  A  Pursuit-based  Optimal  Control  Algorithm 

We  assume  that  we  have  available  an  initial  feasible  (but 
sub-optimal)  control/trajectory  pair  (ufeas(t),Xfeas(t))  for 
(1),  obtained  through  a  combination  of  a-priori  knowledge 
about  the  problem  and/or  random  exploration.  We  consider 
the  formation  of  an  ordered  sequence  of  agents,  with  each 
agent  trying  to  reach  its  predecessor  along  an  optimal 
trajectory.  The  sequence  is  initiated  with  the  first  agent 
following  Xfeas  to  the  desired  final  state.  The  precise  rules 
that  govern  the  movement  of  each  agent  are: 

Algorithm  (Sampled  Local  Pursuit):  Identify  two 
states  Xo  and  Xf  on  B.  Let  Xq (t)  ( t  £  [0,  T])  be  an  initial 
trajectory  satisfying  (1)  with  £0(0)  =  Xo,xo (T)  =  Xf. 
Choose  A,  d  £  R  such  that  0  <  5  <  A  <  T.  Then: 

1)  For  k  =  1, 2, 3  . . .,  let  tk  =  kA  be  the  starting  time 
of  the  kth  agent,  i.e.  uk(t)  =  0,  xk(t)  =  Xq  for 
0  <t<  tk. 

2)  When  t  =  tk  +  id,  i  =  0, 1,2,3,...,  calculate  the 
control  (r)  that  achieves  (subj.  to  (1)): 

rj{xk(t),xk-i(t),  A,  t,  A),  t£  [t,t  +  A] 

if  A  +  iS  <  T 

v(xk(t),Xf,A,  t,  A),  t£  [ t,tk+T } 

otherwise 

where  A  =  tk  +  T  —  t. 

3)  Apply  uk(t)  =  11%  +iS(t  —  tk  —  iS)  to  the  kth  agent 
for  t  £  [tk  +  iS,  tk  +  (z  +  1)<5)  if  A  +  id  <  T,  or  for 
t  £  [tk  +  id,  tk  +  T)  otherwise. 

4)  Repeat  from  step  2  until  the  kth  agent  reaches  Xf. 

There  are  two  adjustable  parameters  in  the  sampled 

local  pursuit  (SLP)  algorithm:  the  “following  interval”  A 
which  denotes  the  frequency  with  which  new  agents  depart 
from  Xo,  and  the  “updating  interval”  <5  which  denotes  the 
frequency  with  which  an  agent  samples  the  state  of  his 
predecessor.  To  illustrate  the  pursuit  process,  we  refer  to 
the  (k  —  l)th  agent  as  the  “leader”  and  to  the  kth  agent  as 
the  “follower”.  We  will  refer  to  the  times  t\  =  tk  +  id,  i  = 
0, 1,2,3...  as  the  “updating  times”.  At  every  updating 
time,  the  follower  finds  an  optimal  trajectory  from  itself  to 
its  leader,  and  moves  on  it  during  [tlk,tlk  +  <5] ,  until  next 
updating  time.  This  process  continues  until  the  follower 
reaches  the  final  state.  Usually  we  take  0  <  d  <  A  so  that 
each  agent  only  needs  to  solve  a  finite  number  of  optimal 
control  problems.  If  the  problem  in  question  can  be  solved 
efficiently,  one  may  choose  to  decrease  d.  In  fact,  the  case 
5  — >  0  leads  to  a  continuous  version  of  the  SLP  algorithm 
(of  which  [6]  and  [4]  are  special  cases),  where  each  agent  is 
constantly  updating  its  trajectory  in  response  to  its  leader’s 
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movement.  Details  can  be  found  in  [16].  In  the  next  section 
we  show  that  SLP  leads  to  (locally)  optimal  trajectories. 

III.  MAIN  RESULTS 

Recall  that  the  proposed  algorithm  defines  an  ordered 
sequence  of  trajectories  {a ;&(£)}.  We  would  like  to  inves¬ 
tigate  the  properties  of  the  limiting  trajectory  generated  by 
the  group,  i.e.  xk(t )  as  k  — »  oo.  We  begin  by  discussing 
convergence  of  the  iterated  trajectories. 

Lemma  1:  (Convergence  of  Cost)  Assume  a  group  of 
agents  Xq,Xi  , ...  ,xk  evolve  under  SLP  with  starting  state 
Xq  and  target  state  Xf.  Suppose  an  initial  control/trajectory 
pair,  {ito(f), xo(f)}  (t  £  [0,T]),  satisfying  xq (t)  =  Xq 
and  Xq  (T)  =  Xf  is  given.  If  the  updating  time  satisfies 
0  <  d  <  A,  then  the  cost  of  the  iterated  trajectories  will 
converge,  i.e.  lim^oo  C(xk,tk,T)  exists. 

Proof:  Consider  the  pursuing  process  between  the 
(k  —  l)th  and  kth  agents.  As  shown  in  Fig.  1,  the  dotted 
line,  denoted  by  xk-i(t)  on  [tk-i,tk-i  +  T],  indicates  the 
leader’s  path.  The  solid  lines,  denoted  by  xk(f),  are  the 
realized  trajectories  of  the  “follower”,  and  the  dashed  lines, 
noted  by  xk(t),  are  the  planned  trajectories  along  which 
the  follower  plans  to  move  at  tk  +  id  but  may  not  do  so 
because  it  will  update  its  future  trajectory  at  tk+(i+l)d.  For 

Xk-1  (tk+2  ) 


Fig.  1.  Illustrating  the  trajectories  of  a  leader-follower  pair  during 
SLP.  The  dotted  line  represents  the  trajectory  of  the  leader.  Solid  lines 
represent  the  trajectory  of  the  follower.  The  dashed  lines  are  trajectory 
segments  which  the  follower  plans  but  decides  to  alter,  because  of  later 
measurements  of  the  leader’s  state. 

t  £  [tk:  tk  +  6],  the  follower  moves  on  an  optimal  trajectory 
from  state  xk(tk)  to  Xk-i(tk)  over  A  units  of  time.  Thus 
from  Fact  1: 

t](Xk  if  k)  j  Xk—  1  (tk)  ,  A,  t k :  A)  ^  C(Xk—l ,  t k—  1 ,  A) 

At  time  tk +  6  the  follower  reaches  the  state  xk(tk  +  S). 
Recalling  that  the  trajectory  driven  by  uL(t)  is  optimal 
from  Xfe(ffe)  to  Xfc_i(tfc)  and  from  Fact  3,  we  can  divide 
the  cost  into  two  parts,  one  is  actual  and  the  other  is 
planned,  which  are  are  both  optimal  with  respect  to  their 
corresponding  end  points.  That  is 

il(xk{tk),xk-i(tk):  A,  4,  A)  = 

=  T](Xk(tk),Xk-l{tk),^,tk,S)  + 

+v(xk{tk  +  5),Xk-i(tk),A  -6,tk  +  S,  A-  5) 


At  time  tk  +  5,  the  follower  updates  its  trajectory  to 
catch  up  the  leader  at  its  new  location  xk(tk  +  S).  For  this 
trajectory  is  optimal  from  Xk{tk  +  S)  to  Xk-i(tk  +  S)  over 
time  A,  any  path  xk(t)  ( t  £  [tk  +  S,  tk  +  5  +  A])  that  is 
from  Xk{tk  +  6)  to  xk-i(tk  +  6)  over  time  A  and  passes 
through  Xk-i(tk)  at  time  tk  +  A  =  tk  +  S  +  A  —  5  has 
equal  or  more  cost.  From  Fact  2  it  follows  that: 

rt(xk{tk  +  S),xk-i{tk  +  5):  A,  tk  +  5,  A) 

<  v{xk{tk  +  S),xk-i(tk),A  -  S,tk  +  6,  A-  6)  + 
+C{xk-i,tk,S) 

From  the  last  equation  and  the  principle  of  optimality,  we 
obtain 

C(xk  :  tk  :  2d) 

<  C(xk~i,  tfc-i,  A  +  <5)  -  C(xk,tk  +  26,  A  -  6) 

We  repeat  this  procedure  until  t.  =  tk  +  nS  where  A  +  (n  — 
1)<5  <  T  and  A  +  nd  >  T.  Then 

C(xk,  tk,  nS)  = 

n—  1 

=  YMxk{tk  +  iS),Xk-i(tk  +  iS),A,tk  +  id,  6) 

i= o 

<  C(xk-i,tk-i,A  +  (n-l)S)- 

-C(xk:tk+n6:A-  6)  (6) 

When  t  £  [tk  +  n6,  tk  +T],  the  leader  reaches  the  final 
state  and  stays  static.  During  this  time  period,  no  matter  how 
many  times  the  follower  updates  its  movement,  it  will  move 
on  the  same  path  that  was  determined  at  time  t  =  tk  +  nd. 
This  path,  which  is  indicated  by  the  last  solid  line  in  Fig. 
1,  is  locally  optimal  between  the  states  xk{tk  +  nd)  and 
Xk  (tk  +  T)  over  T  —  nd  units  of  time.  Therefore 

C(xk:  tk  +  nd,T  -  nd)  <  C(xk:tk  +  nd,  A  -  d)  + 
+C(xk-i,tk  +  (n  -  1)<5,  T  -  (n  -  1)<5  -  A)  (7) 

From  (6),  (7)  we  obtain 

C(xk,tk,T)  <  C(xk-i,tk-i,T)  (8) 

Writing  Ck  =  C(xk,tk,T)  for  convenience,  we  can  see 
that  Ck  <  Ck- 1-  Thus,  Ck  is  bounded  below  and  we 
conclude  that  Ck  exists.  ■ 

Of  course,  the  convergence  of  trajectories’  cost  does  not 
imply  the  convergence  of  the  trajectories  themselves.  If 
there  exist  multiple  locally  optimal  trajectories  connecting 
the  leader  and  follower  at  any  updating  times,  then  the 
convergence  of  trajectories  is  not  guaranteed.  However,  if 
we  restrict  the  pursuit  process  to  take  place  within  a  “small” 
region  by  selecting  A  sufficiently  small,  there  will  generally 
exist  a  unique  locally  optimal  trajectory  from  the  follower 
to  the  leader  at  every  updating  time  tk  +  id,  and  the  agents’ 
trajectories  converge: 

Lemma  2:  (Uniqueness  of  the  Limiting  Trajectory)  If  at 
each  updating  time,  the  locally  optimal  trajectory  obtained 
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through  SLP  is  unique,  then  the  limiting  trajectory  x^f) 
is  also  unique. 

Proof:  Suppose  there  exist  more  than  one  limiting 
trajectories,  for  example  X\ (f)  and  x2 (f)-  Let  X\(t)  x2(f) 

for  f  G  [fi,  f2]  U  [tz,  tf[  U  . . .  U  [in_i,  tn}.  From  Lemma  1, 
the  two  trajectories  must  have  equal  costs. 

Let  a  leader  Xk-i(t)  evolve  along  27(f),  while  the 
follower  Xfe(f)  does  so  along  27(f).  If  no  update  occurs 
during  [fi,t2],  then  27(f)  costs  less  during  [fi,f2]  because 
the  follower  moves  along  27(f)  and  we  have  assumed  that 
the  optimal  trajectories  from  follower  to  leader  are  unique. 
A  similar  argument  on  other  intervals  where  X\  ^  27  leads 
to  the  fact  that  the  cost  along  27(f)  is  less  than  that  along 
x\ (f)  if  no  update  occurs  during  t  G  [fi,f2]  U  [<3^4]  . . .. 
This  contradicts  the  assumption  that  27  and  27  have  equal 
costs. 

Next,  suppose  that  the  follower  updates  its  trajectory  once 
during  [fi,f2],  as  Fig.  2  illustrates.  Separate  the  curves  dur- 


Fig.  2.  Illustrating  the  case  of  a  single  trajectory  update  occurring  at  a 
point  where  the  leader  and  follower  trajectories  differ. 

ing  [ti,  f2]  into  several  segments  (which  have  been  labeled  1 
through  5),  and  indicate  the  cost  along  curve  i  as  Ci.  From 
the  uniqueness  of  local  optimum,  we  have  Ci  +  C5  <  C3 
and  C2  <  C5  +  C4.  Hence  Cj  +  C'2  <  C3  +  C.\ ,  which 
means  27(f)  has  less  cost  than  27 (f)  during  [fi,f2]. 

A  similar  argument  shows  that  if  there  are  multiple 
updates  during  [fi,f2],  the  cost  along  27(f)  is  still  less  than 
that  of  27 (f).  Iterating  on  the  time  intervals  during  which 
27  x2  leads  to  the  conclusion  that  27(f)  costs  less  than 
27 (f),  which  is  a  contradiction.  ■ 

The  following  definitions  will  be  necessary  for  discussing 
the  properties  of  the  limiting  trajectory. 

Definition  1:  Let  71(f)  and  72(f)  be  trajectories  of  (1), 
defined  on  time  intervals  I\  and  /2  respectively,  where  I\  D 
/2  f^0.  We  say  that  71  and  72  overlap  if  71  (f )  =  72  (f )  for 
all  f  G  I\  Pi  12- 

Definition  2:  Let  71(f)  and  72(f)  be  trajectories  of  (1), 
defined  on  a  time  inten’al  1 1  and  another  time  interval  /2 
respectively,  where  /j  n  J2  fip.  The  composition  of  71  (f) 
and  72(f)  on  the  interval  I\  U  I2  is  defined  as 

a  /  7iW  f  G  Ii,  t  (j  I2  —  Ii  fl  I2 

7l°72  =  i  72(f)  f^/i,fGWin/2 

The  locally  optimal  trajectories  obtained  at  every  updat¬ 
ing  time  are  smooth  for  many  optimal  control  problems  (e.g. 
the  solution  to  the  Euler-Lagrange  equations).  Nonetheless, 
27(f)  is  only  known  to  be  piecewise  smooth.  However,  we 
can  show  that  the  limiting  trajectory  is  smooth  in  the  entire 


interval  [0,  T]  if  the  locally  optimal  trajectories  obtained  at 
every  updating  time  are  smooth. 

Lemma  3:  Suppose  that  in  Lemma  1  the  updating  inter¬ 
val  S  and  the  following  inten’al  A  satisfy  that  0  <  <5  <  A, 
then  for  leader-follower  pairs  that  evolve  along  the  limiting 
trajectory,  the  planned  trajectories  x(t)  and  realized  tra¬ 
jectories  x(t)  overlap.  Furthermore,  if  the  locally  optimal 
trajectories  obtained  at  every  updating  time  tk  +  iS  are 
smooth,  then  the  limiting  trajectory  is  also  smooth. 

Proof:  Consider  an  agent  Xk-i  that  moves  along 
the  limiting  trajectory  Xoo(f)-  This  implies  that  Xk-i(t)  = 
Xk(t  +  A)  for  Vf  G  [ffc,ffe  +  T\.  First  we  claim  that  in  the 
time  interval  [ffc  +  6,  tk  +  A],  the  planned  trajectory  agrees 
with  the  realized  one,  i.e.  xfc(f)  =  xfc(f),f  G  [tk  +  S,  ffc  +  A]. 
Suppose  that  Xfc(f)  Xk(t)  for  some  f  G  [4  +  S,tk  +  A]. 
Because  x(t)  is  optimal  from  Xk{tk  +  S)  to  Xk(tk  +  S  +  A), 
the  trajectory 

Xk(t)  f  G  [tk  +  <5,  ffc  +  A) 

Xk(t)  t  G  [ffc  +  A,  tk  +  5  +  A] 

has  less  cost  than  the  trajectory  Xkifi)  (f  G  [ffc  +  <5,  ffc  + 
S  +  A]),  which  is  updated  by  the  follower  at  the  time  f  = 
ffc  +  5  and  is  supposed  to  be  optimal  from  Xk{tk  +  6)  to 
Xfc(4  +  S  +  A).  Thus  there  is  a  contradiction.  Hence  we 
obtain  Xfc(f)  =  Xfc(f)  for  Vf  G  [ffc  +  <5,  ffc  +  A].  The  same 
argument  can  be  applied  to  other  time  periods. 

Now,  x(t)  is  smooth  for  f  G  [ffc,  ffc  +  A]  because  the 
local  optima  of  (2)  are  smooth,  and  27  (f)  is  smooth  for  f  G 
[tk+S,tk  +  S+A]  (second  update  step)  for  the  same  reason. 
Furthermore,  we  know  that  Xkif)  =  Xfc(f)  for  Vf  G  [ffc  + 
6,  ffc  +  A].  Thus  the  actual  trajectory  Xk(t)  (t  G  [ffc,  ffc +  26]) 
is  smooth.  Repeating  this  argument  for  f  G  [ffc +  26,  ffc +  36], 
etc,  leads  to  the  result  that  the  entire  trajectory  Xfc(f)  (f  G 
[ffc,  ffc  +  T])  is  smooth.  ■ 

Before  proceeding  to  the  main  theorem,  we  will  require 
that  the  optimal  cost  in  (2)  changes  “little”  with  small 
changes  to  the  endpoints  of  a  trajectory: 

Condition  1:  Assume  there  exists  an  e  >  0  such  that 
for  all  a,6i,62  G  D  and  all  A  >  0,  the  optimal  cost 
77(0, 61,  A,  0,  A)  from  a  to  h\  and  rj(a,  62,  A,  0,  A)  from 
a  to  62  satisfy 

||6i  -  b2\\oo  <  £ 

=+  ||?y(a,  61,  A,0,  A)  -  r?(a,  62,  A,0,  AjUoo  <  CA 

for  some  constant  C  independent  of  A. 

Piecewise-optimal  trajectories  are  not  necessarily  op¬ 
timal.  However,  the  composition  of  overlapping  optimal 
trajectories  is  locally  optimal,  if  Condition  1  is  satisfied. 

Lemma  4:  (Composition  of  Optimal  Trajectories):  Let 
71(f)  and  72(f)  be  overlapping  locally  optimal  trajectories 
defined  on  the  inten’als  I\  and  /2  respectively,  where 
I\  0  I2  7^0-  If  Condition  1  is  satisfied,  then  the  composition 
7i  0  72  is  locally  optimal  on  I\  U  12- 

Proof:  It  is  enough  to  show  that  if  x*(t)  (f  G 
[0,  f  1  +  A 1] )  and  x*(t)  (f  G  [fi,T])  are  two  locally  optimal 
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trajectories,  where  0  <  t\  <t±  +  A\  <  T,  and  Condition  1 
is  satisfied,  then  the  trajectory  x*(t),t  £  [0,  T]  is  a  locally 
optimal. 

Take  0  <  A  <  Ai.  From  principle  of  optimality,  we 
have  that  x*(t)  (t  £  [0,  fi  +  A])  and  x*(t)  ( t  £  [ti,T]) 
are  two  locally  optimal  trajectories  with  respect  to  their 
corresponding  end  points.  If  x*(t)  (t  £  [0,T])  is  not  a 
local  optimum,  there  must  exist  an  e  <  e  and  an  optimum 
x(t)  £  D  x  [0,T]  satisfying  ||x(f)  —  £*(i)||oo  <  e  and 
C(x(t),0,T)  <  C(x*(t),0,T),  as  Fig.  3  illustrates. 


Fig.  3.  Illustrating  the  proof  of  Lemma  4:  overlapping  optimal  trajectories 
form  a  locally  optimal  trajectory. 

Construct  two  optimal  trajectories  yi(t),y2(t)  (t  £ 
[t\,ti  +  A])  connecting  x(t)  and  x*(t)  such  that  x*(ti)  = 
U2{t i),  x*(ii  +  A)  =  2/i (ti  +  A),  x(ti)  =  yi(ti),x(ti  + 
A)  =  t/2 (ft  +  A).  From  the  principle  of  optimality,  x*(t) 
and  x(t)  ( t  £  [ti ,  ti  +  A])  are  both  optimal  with  respect 
to  their  corresponding  end  points.  Now  from  Condition  1, 
we  have 

<7(2/i(f),fi,A)  <  C(x(t),t1,A)  +  £A 
C(y2(t),h,A)  <  C(x*(t),t1,A)  +  CA  (9) 

Fora;*(f)  (t  £  [0,  fi  +  A])  and  x*(t)  (t  £  [ti,  T])  are  two 
unique  local  optimal  trajectories,  we  have 

C(x*(t),0,ti)  +  C(x*(t):t1:  A) 

<  C(x(t),0,ti)  +  C(yi(t),ti,  A)  (10) 

C(x* A)  +  C(x*(t),t i  +  A,T  -  h  -  A) 

<  C(x(t),  ti  +  A,  T  —  ti  —  A)  +  C{y2{t),  ti,  A)  (11) 

Combining  (9)-(  11)  leads  to 

C(x*(t),  0,  T)  <C(x(t),0,T)  +  2CA  (12) 

The  cost  C(x(t),  0,  T)  was  assumed  to  be  less  than 
C(x*(t),  0,  T),  but  if  we  choose  A  so  that 

0  A  C(x*(t),0,T)  -  C(x(t),0,T) 

2  £ 

we  see  that  (12)  cannot  hold.  This  is  a  contradiction  because 
A  can  be  arbitrarily  small.  Hence  x*(t)  ( t  £  [0,T])  must 
be  a  local  optimum.  ■ 

The  next  theorem  is  an  immediate  consequence  of  the 
above  lemmas. 

Theorem  1:  Suppose  a  group  of  agents  {xk}  evolve 
under  sampled  local  pursuit  and  at  each  updating  time 


t  =  tk  +  iS,  the  locally  optimal  trajectory  from  Xk  (f )  to 
Xk~i(t)  is  unique.  If  the  updating  interval  5  and  following 
interval  A  satisfy  0  <  S  <  A  and  Condition  1  holds,  then 
the  trajectory  sequence  {xk}  converges  to  a  unique  local 
optimum.  Furthermore,  if  the  locally  optimal  trajectories 
from  each  follower  to  its  leader  are  smooth,  the  limiting 
trajectory  is  also  smooth. 

Proof:  From  Lemma  2,  the  limiting  trajectory  is 
unique.  We  know  that  a;00(f)  (t  £  [0,  A))  and  x00(t)  ( t  £ 
[<5, 5+ A))  are  locally  optimal  for  the  realized  trajectory  and 
planned  trajectories  overlap  (Lemma  3).  The  optimality  of 
Xoo{t)  (t  £  [0,  S  +  A))  follows  from  Lemma  4.  Repeating 
this  argument  on  [i5,  iS  +  A]  (*  =  0,1,2...)  leads  to  the 
result  that  Xoo(t)  (t  £  [0,T])  is  locally  optimal.  The  proof 
of  smoothness  follows  from  a  similar  argument.  ■ 

Remarks:  SLP  is  a  cooperative,  decentralized  algo¬ 
rithm  for  learning  optimal  controls/trajectories,  starting 
from  a  feasible  solution.  Each  agent  is  only  required  to 
calculate  optimal  trajectories  from  its  own  state  to  that  of 
its  leader.  Because  agents  are  separated  by  A  time  units  as 
they  leave  xo,  each  agent  relies  on  local  information  only  in 
order  to  follow  its  predecessor  and  requires  no  knowledge  of 
the  global  geometry.  There  is  no  need  for  agents  to  exchange 
or  “fuse”  local  maps.  Agents  do  not  need  to  communicate 
their  choice  of  coordinate  systems  as  they  evolve,  nor  do 
they  need  to  know  the  coordinates  of  xy.  While  it  is  possible 
that  a  group  of  agents  could  disperse  and  construct  a 
global  map  from  local  information,  such  an  approach  would 
require  significantly  more  computation  and  communication 
than  SLP.  SLP  solves  the  optimal  control  problem  in  “short 
pieces’  which  makes  it  appropriate  for  systems  with  short- 
range  sensors  (for  example,  in  the  case  of  a  swarm  of  robots 
exploring  unknown  terrain).  Each  agent  solves  a  finite 
number  of  instances  of  the  optimal  control  problem,  with 
initial  and  final  states  which  are  “close”  to  one  another;  if  a 
closed  form  solution  is  not  available,  SLP  generally  requires 
significantly  fewer  computations  compared  to  solving  the 
problem  (numerically)  from  Xq  to  Xf.  The  above  implies 
that  SLP  is  most  useful  when  the  optimal  control  problem 
is  easier  to  solve  over  “short”  distances. 

We  have  assumed  a  countable  infinity  of  agents;  of 
course,  such  a  collection  cannot  be  realized.  It  is  however 
possible  to  achieve  the  same  results  with  a  finite  number  of 
agents  that  apply  SLP  to  reach  Xf  from  Xq,  then  return  to 
Xo  (perhaps  using  SLP  again).  The  required  modifications 
are  straightforward  but  will  not  be  discussed  here  as  they 
are  beyond  the  scope  of  this  paper.  Finally,  SLP  is  not 
guaranteed  to  converge  to  the  global  optimum.  The  choice 
of  agent  separation  and  updating  interval  can  affect  whether 
the  limiting  trajectory  is  a  local  or  a  global  optimum.  Some 
interesting  cases  involving  spaces  with  holes  or  obstacles 
are  discussed  in  [16]. 

IV.  EXAMPLES 

First,  consider  a  group  of  systems  governed  by  x(t)  + 
x(t)  =  u(t)  where  we  want  to  minimize  x2(t)  +  u2(t)dt 
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with  x(0)  =  0,x(l)  =  1  and  i(0)  =  0,i(l)  =  0. 
With  parameters  values  A  =  0.5,  5  =  0.25,  the  iterated 
trajectories  produced  by  SLP  converged  to  the  optimum,  as 
illustrated  in  Fig.  4. 


Fig.  4.  Applying  SLP  in  a  Lagrangian  problem.  The  initial  trajectory  is 
selected  to  be  redundant,  meanwhile  the  iterated  trajectories  are  converging 
rapidly  to  the  optimum  with  the  parameter  selection  of  A  =  0.5,  S  =  0.25. 

Second,  we  consider  a  “geodesic  discovery”  problem  on 
an  environment  consisting  of  a  plane  with  two  right  cones, 
whose  top  view  is  shown  in  Fig.  5.  The  radii  and  heights  of 
the  cone  were  800  and  1000  units  of  length,  respectively. 
The  agents  were  governed  by  Xk  =  Uk  and  were  required 
to  travel  from  Xq  =  (3500,0,0)  to  Xf  =  (—1300,0,0). 
Minimum-length  paths  are  difficult  to  compute  in  this 
setting  because  they  involve  optimal  switching  between 
different  coordinate  patches  (those  of  the  plane  and  the  two 
cones).  By  applying  SLP  with  T  =  1000,  A  =  200,  5  =  1, 
followers  need  to  calculate  locally  optimal  trajectories  on  at 
most  two  coordinate  systems,  thus  reducing  the  complexity 
of  the  problem.  Figure  5  illustrates  the  iterated  trajectories. 


Fig.  5.  Sampled  local  pursuit  in  a  complex  environment.  The  initial 
trajectory  (along  the  borders  of  the  cones)  is  easily  found  but  far  away  from 
the  optimum.  The  locally  optimal  trajectories  are  much  easier  to  found  than 
the  global  optimum  because  we  limit  the  pursuit  distance  by  selecting 
A  =  0.2 T.  The  iterated  trajectories  converge  to  the  local  optimum. 


V.  CONCLUSIONS  AND  ONGOING  WORK 

We  have  explored  a  biologically-inspired  cooperative 
strategy  (termed  “Sampled  Local  Pursuit”)  for  solving  a 


class  of  optimal  control  problems  with  fixed  final  time  and 
state.  The  proposed  algorithm  generalizes  previous  models 
that  mimic  the  foraging  behavior  of  ant  colonies.  It  allows 
a  collective  to  discover  optimal  controls,  starting  from  an 
initial  suboptimal  solution.  Members  of  the  collective  are 
only  required  to  obtain  local  information  on  their  environ¬ 
ment  and  to  calculate  optimal  trajectories  to  their  nearby 
neighbors.  The  proposed  algorithm  relies  on  cooperation  to 
perform  a  task  which  would  be  difficult  or  impossible  for  a 
single  system,  namely  solving  an  optimal  control  problem 
with  little  information  (in  terms  of  coordinate  systems  that 
describe  the  environment  or  the  coordinates  of  the  final 
state)  and  short-range  sensing. 

There  are  several  natural  extensions  of  this  work,  in¬ 
cluding  broadening  its  scope  to  include  problems  with 
free  final  time  and  state,  and  investigating  the  algorithm’s 
convergence  rate,  as  well  as  its  ability  to  lead  to  global 
(as  opposed  to  local)  optima  by  choice  of  the  algorithm’s 
parameters. 
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